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Inference and Discovery 
in an Exploratory Laboratory 

Valerie Shute 
Robert Glaser 
Kalyani Raghavan 

Introduction 

Formulating and testing hypotheses using observations and empirical findings Is 
not only central to scientific work, but also to the acquisition of knowledge In 
general. As new information is obtained and hypotheses are Inferred, they serve as a 
basis for confirming or refuting perceived regularities and lawful relationships. In the 
research described here, we employ a computer laboratory, which we call an 
intelligent discovery world, to study the strategies students use to explore this 
environment. Our interest focuses on studying individual differences In strategies of 
inference and discovery, including comparative studies of successful and less 
successful learners, and eventually studies of tutorial assistance to discovery skills. 

The central problem of induction and hypothesis formation is to carry out 
cognitive performances that ensure that inferences drawn are plausible and relevant 
to the world or system being observed. The plausibility of inductions and stated 
hypotheses can be determined with reference to knowledge obtained about the 
system. Thus the students* process of inference depends on the application of 
observation, experimentation, and data organization that enable the specification and 

7 
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testing of the knowledge obtained through experiments, hypotheses, and 
confirmations. As Holland, Holyoak, Nisbett and Thagard (1988) wrote: "The study 
of induction, then, is the study of how knowledge Is modified through its use" (p. 5). 

The kind of learning that we are considering has a reasonably long research 
history in experimental psychology, mostly in the context of laboratory and 
knowledge-lean tasks. In recent years, research has taken place in more complex 
situations, as well as in studies of machine learning, experimental studies, and 
computer simulation of problem solving and discovery tasks (Klahr & Dunbar, 1987; 
Kuhn & Phelps, 1982; Langley, Simon, Bradshaw & Zytkow, 1987). Still, relatively 
little work has investigated the domains taught in schools and formal education. 
Some exceptions are studies of microworlds in physics (Champagne & Klopfer, 1982; 
DiSessa, 1982; White, 1983; White & Horowitz, 1987). 

As indicated, inductive problem solving information can be present in the 
environment, and the problem solver must attempt to And a general principle or 
structure that is consistent with this information. Scientific induction is an important 
example of this, as is medical and technical diagnosis in which a set of symptoms is 
presented and the task is to induce the fault or cause. To paraphrase Greeno and 

Simon's description: 

Solving an induction problem can proceed in two ways, and in most tasks 
a combination of the methods is used. A top-down method involves 
generating hypotheses about the structure and evaluating them with 
information about the observed instances. A bottom-up method involves 
storing information about observations and events and making Judgments 
about new events on the basis of similarity or analogy to the stored 
information. To perform the top-down method, the problem solver requires 
a procedure that generates or selects hypotheses, a procedure for evaluating 
hypotheses, and then a way of using the hypothesis generator to modify or 
replace hypotheses that are found to be incorrect. To use the bottom-up 
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method, the problem solver needs a method of extrapolating from stored 
information, either by judging similarity of new stimuli to stimuli stored in 
memory or by forming analogical correspondences with stored information 
(1984, p. 82). 

To a large extent, classic studies of induction have focused on inducing a rule or 
classifying relatively abstract stimuli into categories on the basis of feedback about 
classification errors and other information (see Pellegrino & Glaser, 1980; Smith & 
Medin, 1981). Given our concern for exploratory environments, we perceive this large 
literature as pertaining, for the most part, to passive induction in which the learners 
induce rules, make hypotheses, and classify and taxonomize observations on the basis 
of sets of pre-determined instances designed by the experimenter. However, a more 
active process is apparent when the learner can select variables, design instances, and 
interrogate his or her existing knowledge and memory for recent events. In the latter 
form of induction, we need a research paradigm that allows us to examine active 
experimentation in which learners explore and generate new data and test hypotheses 
with the data they have accumulated in the course of their investigations. Recent 
experimental technology and computer modeling have made this type of 
experimentation feasible (Bonar, Cunningham & Schultz, 1986; Michalski, 1986; 
Yazdani, 1986). 

In our research program, we have been investigating the learning of topics in 
elementary physics, basic electronics, and economics. In this chapter, we report on 
the economics world, called Smithtown. The environments we design enable us to 
investigate a range of inductive or discovery learning, from learning in purely 
discovery environments to more guided discovery worlds. What we are learning from 
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our work is that as students explore phenomena, they can be guided and coached in 
the interrogation of a subject matter, analyzing their own understandings and 
misunderstandings, assessing progress toward their goals, and revising their problem 
solving and learning strategies. 

Our exploratory systems are designed to record, structure, and play back to 
students their own problem solving processes. Such systems have been developed in 
algebra and geometry, where they provide a structured ■ trace* of problem solutions 
so that students can see the alternative paths that they have tried (Anderson, Boyle, 
Farroll, & Reiser, 1984; Brown, 1983;. Previous papers report early work (Reimann, 
1986; Shute & Glaser, in press) and this paper describes an initial study of individual 
differences in exploration, data collection, and hypothesis formation in an exploratory 
world of microeconomic laws. 

Smithtown Is a computer program that provides a discovery environment for 
learning elementary microeconomics. An ideal sequence of iterative behaviors in 
Smithtown would include: exploring the world (informally), developing a plan for 
investigation (more formally), choosing on-line tools or techniques for executing the 
plan, collecting and recording data from the experiment, organizing the results, seeing 
if the data confirm or negate prior beliefs, constructing a problem representation, 
modifying the problem based on discrepant results, refining the problem based on 
additional information, recognizing discrepancies between the result and expectations, 
testing out findings in additional realms, and finally, generalizing a principle or law. 

The focus of the study we will be discussing is on students' "inductive inquiry 
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skills." which In this context refers to the students' effectiveness in collecting, 
organizing, and understanding data, concepts, and relationships in a new domain. 
This system has been implemented on a Xerox 1108 Lisp machine, allowing self- 
paced, individualized, and interactive instruction in a rich data source (see Shute Sc 
Glaser. in press, for an overview of the system). 

We hypothesize that discovery learning can contribute to a rich understanding 
of domain information by enabling the student to access and organize information. 
Further- lore, a proposition to be evaluated in this work is that effective interrogative 
skills are teachable if the particular skills involv ;u can be articulated and practiced 
under circumstances which require them to be used. 

Intelligent tutorial guidance, in conjunction witfc a discovery world 
environment, can potentially transform a student's problem solving performance into 
efficient learning procedures rooted in an individual's own actions and hypotheses. In 
such experiential learning, students interact with new subject-matter situations, 
comparing their observations with their current beliefs and theories, which may be 
rejected, accepted, modified, or replaced (see Glaser, 1984). In the course of this 
developing knowledge, students ask questions, make predictions, make inferences, a T ad 
generate hypotheses about why certain events occur with systematic regularity. 
Significant experience of this kind in discovering principles in a field of knowledge 
should alter the relation learners perceive between themselves and the knowledge, 
and their way of behaving when they forget a solution procedure or encounter an 
unprecedented problem (Cronbach, 1966). 
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We report on the results of an empirical study conducted using Smithtown. 
The report is divided into five sections: rvuowledge Bases or "Experts" in 
Smithtown, Maneuvering through Smithtown with On-line Tools, Learning and 
Individual DJfferences, the Results, and a General Discussion. 



The primary purpose of the system is to help students become more methodical 
and scientific in learning a new domain. The first knowledge base or "expert" we 
will discuss deals with efficacious inquiry sKills. 

The First Knowledge Base: Inductive Inquiry Skills 

An earlier study, conducted with Smithtown, yielded information about more 
and less effective behaviors for interrogating and inducing information from a new 
domain (reported in Shute & Glaser, in press). This information was subsequently 



coded into rules that the system monitors in coojunction with a learner's actual 
behaviors. Thus, the system knows of sequences of good behaviors and also sequences 
of ineffective or "buggy" behaviors. 

The system leaves a student alone if s/he is performing adequately in the 

environment. However, if the system determines that a student is floundering or 

demonstrating buggy behaviors, the Coach will intervene ana offer assistance on the 

specific problematic behavior(s). For instance, if astudeuc persists in changing many 

variables at Mie time without first collecting baseline data into the on-line notebook, 

the rule that would be invoked would look like the following (paraphrased): 

If - The student changes more than two variables at a time prior to 



Knowledge Bases in Smithtown 



collecting baseline data for a given market, and it is early in the 
session where the experiment number is less than four, 
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Then- Increm;nt the "Multiple Variable Changes" bug count by I and 

pass the list to the Coach for poss*ble assistance. 

If t -lis rule dees get fired and tn*> rr ,r ril>»r of times it has been invoked has 
surpassed some threshold value (e.g., four tiiii , then tne Coach v/ould appear and 
say, 

V sie that you're changing several variables at the so ne time. A better 
strategy would be to enter a market, see what the data look like bejore any 
variables have been changed, then just change one variable while holding 
all the others constant. * 

In add! to the rules monitored by the system, we developed a list of 
performance measures or "learning indicators" that enable us to determine what type 
of actions or behaviors yield better performance in this type of environment. A range 
of learning indicators was created, from low-level, simple counts of actions (e.g., total 
number of activities taken within Smithtown) to higher-level, complex behaviors (e.g., 
number of times a manipulation to an independent variable was made that showed 
an obvious change in the dependent variables). These Indicators will be discussed in 
a later section and serve as one data source for our study on individual differences In 
learning In Smithtown. 
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The Second Knowledge Base: Economic Concepts in Smithtown 

The second knowledge base or •expert* in the system knows about the 
functional relationships among economic variables which comprise valid economic 
concepts and laws. The system has a defined instructional domain, which is 
decomposed into key concepts that are organized in a bottom-up manner (i.e., from 
simpler to more complex ideas). An understanding of these concepts should result 
from the student's experiments in the microworld. The hierarchy of domain 
knowledge was developed by first, reviewing siv introductory microeconomics 
textbooks and determining the presentation order of information and second, 
discussing the optimal ordering of these concepts for student learning in the 
classroom with a college instructor of economics. 

Although a student is not required to learn the concepts in any prescribed 
order, the hierarchy shown in Figure 1 provides the system with information about 
where the student is likely to be with regard to his/her knowledge acquisition. That 
is, the concept of •equilibrium* can be more readily understood after the laws of 
supply and demand have been learned. 

For the reader unfamiliar with this domain, we will now describe the basic 
concepts in microeconomics that can be learned using Smithtown. 

Supply and Demand. The buyer's side of the market is called demand. The 
law of demand states that the quantity of a product which consumers would be 
willing and able to purchase during some period of time Is inversely related to the 
price of the product. If the price of gasoline goes up, consumers will demand a 
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smaller quantity of gasoline; if the price goes down, consumers will demand larger 
quantities, If we graph this relationship, we get what is called a demand curve (see 
Figure 2) showing how the quantity demanded of a product will change as the price 
of that product changes, holding all other factors constant. 

The seller's side of the market is called supply. The law of supply is that the 
quantity of a product which producers would be willing and able to produce and sell 
is related to the price of the product by a positive function. If the price of color 
televisions goes up, producers will tend to offer more television sets for sale. If the 
price of color television sets goes down, producers will reduce the number of 
television sets they put on the market. If we graph this relationship, we get what is 
called a supply curve (see Figure 3). A supply curve is a graph showing how the 
quantity supplied of some commodity will change as the price of that commodity 
changes, holding all other factors constant. 

Equilibrium, Surplus and Shortage. There are many factors that influence 
the price of a given product, but when a price is reached where the quantity that 
sellers want to sell is equal to the quantity that buyers want to buy, we say that the 
market is at a point of equilibrium (see Figure 4). Competitive markets always tend 
toward points of equilibrium. If the market price is higher than the equilibrium 
price, buyers will demand smaller quantities than sellers are supplying. This will 
create a surplus. Surpluses of unsold goods will convince sellers to lower their price 
down toward the equilibrium level. If, for some reason, the market price is lower 
than the equilibrium price, buyers will demand larger quantities than sellers are 
supplying, thus creating a shortage. Shortages will lead to pric* increases, and the 
price will rise toward the equilibrium level. 
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Changes m Supply and Demand. A change in the price of a good will 
influence the quantity .emanded and supplied and cause movement along a fixed 
curve. A change to variables other than price will cause the entire curve (demand or 
supply) to shift, depending on which variable Is changed and the magnitude of the 
adjustment. We refer to the variables in Smithtown that can be manipulated as 
■town factors," and they include: per capita income, population, interest rates, 
weather, consumer preferences, labor costs, number of suppliers, and the price of 
substitute and complementary goods. For instance, if the population of Smithtown 
was increased from 10,000 to 25,000 persons, then the demand for automobiles would 
increase, resulting in a shift to the right of the demand curve for cars. Alternatively, 
if the number of suppliers of a particular good were to decrease, this would affect the 
supply curve for that commodity, resulting in a shift to the left. These shifts are 
depicted in figures 5 and 6. 

New Equilibrium Point. Competitive markets tend to converge toward 
equilibrium points. Equilibrium, once established, can be disturbed by changes in 
demand and/or supply. If demand and/or supply change, a surplus or shortage will 
result at the original price, and the price will move toward a new equilibrium. A 
shortage at the original price will cause the old price to rise to the new level and 
cause changes in the quantities supplied and demanded. A new equilibrium will be 
established at the second price and the second quantity and may be seen in Figure 7. 

In addition to the above economic concepts, at least two more can be extracted 
from the discovery world, although they are not explicitly recognized by the system: 
cross elasticity of demand and supply. Cross elasticity of demand indicates how a 
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change in one market affects the demand in a related market while cross elasticity of 
supply indicates how a change in one market affects the supply in a related market. 

Maneuvering through Smithtown -with On-line Tools 

Students can discover regularities in the market by manipulating variables, 
observing effects, and using tools to organize the information in an effective way. 
The on-line tools for scientific investigations in Smithtown include a notebook for 
collecting data, a table to organize data from the notebook, a graph utility to plot 
data, and a hypothesis menu to formulate relationships among variables. Three 
history windows allow the students to see a chronological listing of actions, data, and 
concepts learned. 

First, a student selects a market to investigate from the "Goods Menu" and 
informs the system of his or her experimental intentions by choosing variables s/he is 
interested in from the "Planning Menu." For each new experiment, the system asks 
the student if s/he would like to make a prediction regarding the planned 
experiment. If the student chooses "No," the next menu of options is the "Things To 
Do Menu." If the student responds "Yes," a window appears where specific 
statements can be entered about predicted outcomes to a planned manipulation. For 
example, if the student's experiment was to increase the price of gasoline in order to 
see the repercussions in the market place, one prediction could be: 'The quantity 
demanded [of gasoline] will decrease. 9 Explorations and experiments are directed 
from the "Things To Do Menu" where they are provided with 10 options. Bach 

option is described below. 

l. See market sales information. This window displays information on 
tha current state of the market. 
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2. Computer adjust price. The computer will increase or decrease the 
price, whichever brings the current market closer to equilibrium. 

3. Self adjust price. Provides the student with an on-line calculator and 
allows the price of the particular good to be changed. 

4. Make a notebook entry. The student selects variables to record, and 
the current values are automatically put into the notebook (see Figure 8). 

5. Set up table. The table package allows the student to select variables of 
Interest from the notebook, put them together In a table, and sort on any 
selected variable, by ascending or descending order (see Fig.9). 

6. Set up graph. The graph utility allows a student to plot data collected 
from his/her explorations and experiments. This provides an alternative 
way of viewing relations between variables (see Figure 10). 

7. Make a hypothesis. The hypothesis menu allows students to make 
inductions or generalizations from relationships in the data they have 
collected and organized. There are actually four interconnected menus of 
words and phrases comprising the hypothesis menu (see Figure 11). First, 
the "connector menu* includes the items: if, then as, when, and, and 
the. Next, the "object menu" con tains the economic Indicator variables 
used by the system. The "verb menu" describes the types of change, like 
decreases, increases, shifts as a result of, and so on. Finally, the 
"direct object menu" allows for more precise specification of concepts 
such as: over time, along the demand curve, changes other than price, 
etc. As students combine words or phrases from these menus, the 
resultant statement appears in a winlow below. A pattern matcher 
analyzes key words from the input and checks whether this matches 
stored relationships for each targeted concept. For instance, If the 
student stated: As price increases, quantity demanded decreases, the 
system would match that to the law of demand which it understands to 
be the inverse relationship between price and quantity demanded. 

8. Experimental frameworks. There are three "experimental 
frameworks" which provide the student with easy maneuvering within 
and between experiments. These include: Change Good, Same 
Variable(s); Same Good, Change Variable(s); and Change Good, Change 
Variable(s). They are used to change to a new market while holding the 
independent variables the same, change town factor(s) within the current 
market, or to change the tqwn factor(s) and the market, respectively. 

9. History Windows. Three history windows are included In the system, 
accessible by both the students and the system. As students continue to 
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interact with Smithtown, histories accumulate, delineating the various 
actions resulting from different explorations and experiments. This 
summary is maintained In the Student History window. The Market 
History window keeps a record of all variables and associated values that 
the student has manipulated. Finally, there is the Goal History window. 
This provides a representation of what the student has successfully 
learned in terms of concepts targeted by the system. 

Learning and Individual Differences 

In this section, we describe an exploratory study of learning and individual 
differences in performance in this intelligent discovery world environment. The 
system was able to categorize sequences of student actions as being more or less 
effective and intervened with a hint at times when the student was floundering. 

This study was undertaken with two main goals in mind. One goal was to 
evaluate Smithtown to see if individuals interacting with it actually acquired any of 
the economic concepts embedded in the environment (e.g., the law of demand, 
equilibrium point, and so on). The second goal was to determine the performance 
characteristics of those individuals who were more successful in learning in this type 
of environment as compared to those less successful. Another implicit goal was to 
examine the computer architecture and interface features that facilitated or overly 
constrained an exploratory environment. 

The kind of inference-discovery task that we are studying has been interpreted 
within a problem solving framework by Klahr and Dunbar (1987) who conceive of the 
interplay between hypothesis formation and experimental design phases of the 
discovery process as a search between two problem spaces— a hypothesis space of rules 
and an experimental space of instances. "This means that, first, we need to account 
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for the identification of relevant attributes, for, unlike the conventional concept- 
formation studies, our situation does not present the subject with a highly 
constrained attribute space for hypotheses. Second, we need a more complex 
treatment of the instance generator, because in our context it consists of an 
experiment, its predicted outcome, and the observation of the actual outcome" (p. 8). 
Klahr and Dunbar place their subjects in a discovery context by first teaching them 
how to use an electronic device (a computer-controlled robot tank called "BigTrak") 
and then ask them to discover how a particular function works. They formulate a 
general model of scientific discovery as dual search that shows how search in the two 
problem spaces shapes hypothesis genervion, experimental design and the evaluation 
of hypothesis. Strategy differences among subjects were a consequence of the 
efficiency of search in the hypothesis space. Successful subjects were classified 



theorists, and others who abandoned hypothesis testing in order to search the 
experiment space were classified as experimenters. 

In our investigation we aiso take a problem solving perspective and are guided 
in our search for individual differences by certain general findings in problem solving 
performance. For example, Sternberg (1981) makes a distinction between two forms 
of metacognitive performance: global planning and local planning. Global planning 
refers to a strategy that applies to a set of problems and does not focus on the 
characteristics of a particular problem; global planning refers attention to the context 
or overall characteristics of the group of problems. Local planning refers to a 
strategy that is sufficient for solving a particular problem within a given set; local 
planning is less sensitive to general context and focuses more on the difficulty of 
carrying out the specific operations of a problem solving task. Sternberg finds that 
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better reasoners spend relatively more time in global planning of a strategy for 
problem solution and relatively less time in local planning. Such a distinction is also 
evident in studies of expert-novice problem solving. In studies of writing, Ra>c3 and 
Flower (1986) point out that experts attend more to giubal problems than do novices. 
Experts and novices attend to different aspects of a text. Novices focus on the 
conventions and rules of writing; experts make more changes that affect the text's 
meaning. The perceptions of the novices are more local or shallow, and those of the 
expert more global and overall meaningful. The strategies used by novices are local 
strategies concerned with a deletion and addition of words and phrases whereas 
experienced writers are concerned more with strategies that involve changes in 
content and structure. In physics (Larkin, McDermott, Simon & Simon, 1980; Simon 
& Simon, 1978), differences in problem solving between novices and experts also 
relate to surface and deep problem representations. The novice's representation of a 
problem results in a local form of problem solving in which they work with equations 
to solve the unknown. Experts, in contrast, work in a more top down manner 
indicating that a general solution plan is in place before they begin the manipulation 
of specific equations. 

The above findings direct our attention to conceivable differences between good 
and poor inductive problem solvers in terms of the global and local aspects of their 
performance or their attention to specific versus more general features of the problem 
solving task. In a discovery situation, taking a lead from Klahr and Dunbar, we 
translate this distinction to data-driven performance in contrast to behavior which is 
more rule or hypothesis- driven. In our task, an individual starts out with attention 
to computer-generated observations and/or to subject-designed experiments. On the 
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basis of these data, he or she induces generalizations or hypotheses which drive the 
further data collection, data organization and experimentation. Based on the 
problem solving literature described above, we can anticipate that good reasoners 
might display rule-driven performance earlier in their discovery activity, and use 
rules as a performance goal in contrast to more sustained attention to data collection, 
although the latter is necessary at certain points in the course of discovery. 

Furthermore, in addition to behaviors at a general level, we must also look at 
more direct performance components. We refer to specific performance heuristics 
manifested by good reasoners that may not be available to others. A good example 
in discovery performance is the heuristic of identifying one variable as a dimension of 
examination and holding all other variables constant while the chosen one is varied 
systematically. Lawler (1982), in discussing computer based microworlds that use 
logo language, refers to this as variable-stepoing. He points out that Piaget judged 
variable-stepping to be an essential compound of formal operational thought— a 
powerful idea because it is universally useful and crucial to the process of scientific 
investigation. In this regard we look for individual differences in our discovery 
worlds that relate to such performance procedures. 

As a general caveat in the work reported here, it is important to point out that 
scientific discovery involves a whole array of processes including observing and 
gathering data, finding regularities that describe the data, formulating and testing 
the generalizability and limitations of these regularities, and formulating and testing 
explanatory theories. In this study we are primarily concerned with a subset of these 
processes, principally with discovery that starts with a dataset that can be 
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investigated and that derives descriptive rules, laws, or regularities from them. As 
has been pointed out (Bradshaw, Langley & Simon, 1983) "the generation of data, 
and even the invention of instruments to produce new kinds of data are also 
important aspects of scientific discovery. And in many cases, existing theory, as well 
as data, steer th2 course of discovery" (p. 971). In this chapter, we consider the path 
from data to descriptive laws about data (not necessarily explanatory theories). This 
subset of scientific work is important in discovery and in our concern with individual 
differences in induction from data, and the process by which inductive discovery is 
carried out. Also to be kept in mind is the fact that data-driven induction is not 
completely "pure. 11 Individuals come with previous conceptions of regularities in the 
data and they manipulate data and experiment on the basis of hypotheses they 
generate. So, the discovery process that we study here will involve some combination 
or data-driven induction and hypotheses-generated data which guide performance. 

Subjects. Three groups of subjects were involved in the experiment and 
consisted of the following: (l) Students who received traditional classroom 
instruction in introductory economics, (2) A control group which received no 
economics instruction, and (3) Students interacting with Smithtown. There were ten 
subjects in each group. All subjects were from the University of Pittsburgh and none 
had any formal economics training or previous economics courses. The economics 
group were students who volunteered to participate in an experiment and who were 
enrolled in an introductory microeconomics course. About half of the control group 
consisted of psychology students who took the tests for class credit; the other half 
consisted of students selected from those who responded to ads placed around the 
campus for subjects who had no economics background. They took the tests and 
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received a small payment. The experimental group were individuals who similarly 
responded to ads placed around the University of Pittsburgh campus. They were 
paid for their participation. It should be noted that the chapters covered by the 
economics class during the testing interval corresponded to the identical 
material/curriculum covered by Smithtown (i.e., the same introductory economic 
principles involving the laws of supply and demand in a competitive market). All 
subjects were debriefed about the purpose of the experiment at its conclusion. * 

Test Materials. The tesi battery on microeconomics was developed by an 
economics instructor at the University of Pittsburgh. The tests were initially piloted 
by individuals who provided feedback about the tests in terms of the clarity of 
instructions, the timing of the tests, and the general level of difficulty. The battery 
consisted of two tests, multiple choice and short answer. After test development, the 
batteries were reviewed by an independent economics instructor for content validity 
(i.e., completeness and accuracy). 

1. MULTIPLE CHOICE TEST : Two alternate forms were created for the pre- 
and post-test. This involved knowledge of various concepts and principles of 
microeconomics. Subjects had to circle the beso answer from the four alternatives 
given. An example of a pre-test item from the test is: 

The supply curve of houses would probably shift to the left (decrease) if: 

(a) construction workers' wages increased 

(b) cheaper methods of prefabrieation were developed 

(c) the demand for houses showed a marked increase 
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(d) the population increased 

A corresponding post-test item was constructed for each of the pre-test items. 
The counterpart to the above question is: 

Which of the following is likely to move a supply curve for beef to the right (an 
increase)? 

(a) a rise in the price of beef 

(b) a decrease in the price of cattle feed 

(c) an increase in the wages of farm laborers 

(d) a decrease in the price of raw hides 

2. SHORT ANSWER TEST : This test involved the same concepts to be 
defined by the subject for both the pre- and the post-tests. It required elaborated 
knowledge in terms of defining different concepts, coming up with instances of a 
given concept, or drawing a curve on a labelled but empty grid. Two examples from 
the short answer test include: 

(a) What is market equilibrium? 

(b) List a c many important factors as you can causing the demand curve for a 
good or service to shift over to the left or right. 

Each answer on the short answer test was scored with reference to a list of 
necessary and sufficient elements. 

Procedures. Subjects from the economics group were administered a pre-test 
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battery in their class prior to the lectures and readings on the laws of supply and 
demand. They received about two and one half weeks of instruction on this part of 
the curriculum; they were then re-tester in the classroom with the post-test battery. 

The contro! group completed the pre-test battery and then returned in about 
two weeks for the post-tests. This interval corresponded to the pre- to post-test 
intervals for the other two groups. 

The experimental group took the pre-test battery individually, then signed up 
for three additional two-hour sessions. This translated to a total of five hours on the 
computer (Session 1 = pre-test battery plus demonstration of the system, Session 2 
= 2 hours on the computer, Session 3 = 2 hours on the computer, and Session 4 = 1 
hour on the computer and 1 hour for the post-test battery). The sessions were spread 
out over a two week period to correspond to the same time frame as the economics 
group and the control group. Prior to the first real session v/lth the system, students 
were given a Guide to Smittitown in Session 1. This informed them of their goal 
(i.e., to discover principles and laws of economics) and how to best achieve that goal 
(i.e., to imagine themselves as scientists, gathering data and forming and testing 
hypotheses about emerging economic principles and laws). The Guide overviewed 
some of the on-line tools available in Smithtown with examples provided on how to 
use them. Finally, the Guide emphasized that the individual would probably make 
errors or get stuck, but to try to learn from the mistakes. A glossary of terms 
joncluded the Guide and the students were free to take it home with them between 
sessions. 
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Results 

The first question addressed whether the three groups were initially comparable 
on their pre-test battery scores (i.e., multiple choice, MC and short answer, SA). 
Table 1 shows the summary statistics for the raw data while the mean percentage 
scores fo: the pre-test battery and for the post- test battery, collapsed across MC and 
SA, are plotted in Figure 12. 

As can be seen in Table 1 and in Figure 12 , the three groups are initially 
comparable, while on the post-test, both the economics group and the experimental 
group surpass the control group. First we computed an ANOVA (repeated measures 
design where the grouping factor was treatment group and the trial factors were: test 
type and pre- versus post-test condition). The most important interaction that we 
were interested in was: pre/post v sts by treatment group, collapsed across tests, F 
2,27 = 2.99; p = .067. This shows that the three groups did differ in terms of their 
pre to post-test changes in scores. We then computed a Hotelling's T2 test, 
contrasting all three pairwise combinations of groups on the pre-test battery, yielding 

the following; nonsignificant T2 values: 

Economics x Control group: T2= 0.03 p— 0.77 

Control x Experimental group: T2= 0.11 p=0,42 

Economics x Experimental group: T2= 0.03 p=0.80 

After their respective interventions, the groups differed, however the economics 
group and the experimental group ended up with equivalent post-test scores. It is 
important to note that students in the experimental group spent only five hours 
interacting with the discovery world compared to 2.5 weeks (or about 11 hours) of 
classroom lectures and recitation covering identical curricular information. 
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Hotelling's T2 analysis allows us to see particular differences between 
independent groups on. their test scores. The mean vectors for each group can be 
extracted from the summary statistics, above. First, a comparison between the 
economics students and the control group was made on their post-test scores: T2 = 
1.02, p= .003. Thus, as expected, these two groups differed overall !n their test 
scores. Individual t- tests on the data showed that t^. difference is primarily 
associated with the responses on the short answer post-test. The economics students 
had much more complete and articulate responses than the control group (t = 4.28; 
p = .0005). Second, the results from this analysis revealed that the economics group 
and the experimental group performed the same not only on their pre-test scores, but 
on their post-test scores as well. T2= 0.031; p=.774. The experimental group, with 
significantly less time on task, performed comparably with the students in the 
traditional classroom environment. No differences were found between any of the 
individual tests. Third, the control and the experimental groups were compared. It 
was expected that there would be a difference between these two groups in tneir test 
composites given the experimental groups' Interaction with the system. This 
comparison also showed a significant difference between the post-tests: T2 = 1.24; 
p=.001. Individual t-tests were generated for each of the tests, and the short answer 
post- test, again, was the major reason for 'he differences (t = 4.25; p = .0005). The 
experimental group had much more complete responses than the control group. 
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Individual Differences in the Experimental Group 

The results from the between group analyses suggest that overall, Smithtown 
was effective in teaching a targeted set of microeconomic concepts comparable to a 
traditional classroom environment. We now further examined the experimental 
group data to see how differential interaction with this exploratory world affected 
subsequent learning. In other words, some individuals learned more than others from 
the system, and we wanted to know what it was that the more successful individuals 
did in comparisoi to the less successful persons in extracting and understanding new 
knowledge. "Successful," in this context, is someone who started out with a low pre- 
test score on the battery of economics tests and, after interacting with the system, 
ended up with a higu post-test score. Thus, the two interesting comparisons are 
between those scoring: (l) Low on the pre-test and low on the post- test, and (2) Low 
on the pre-test but high on the post-test. We were not interested in those who scored 
high on both the pre- and the post-tests as they, seemed to have started out with 
some domain-related knowledge. Table 2 shows each of the ten experimental subjects 
with their associated pre- and post-test scores (percent correct). 

Our interest is in comparing individuals who scored above the mean gain score 
and below it. Thus, there is a pool of rive subjects having large gains and five 
subjects with small gains. These subjects will be discussed after the presentation of 
the learning indicators. 

Table 3 is a listing of the performance measures or learning indicators that were 
computed for eaoft individual across sessions. For this exploratory study, we 
collapsed data from the sessions into a single index for each indicator, although 
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changes over time will be informative to look at in the future. Two data sources 
were used in computing these values: (1) Detailed computer history lists of all 
student actions, and (2) Verbal protocols from each student about justifications for 
each action, what they expected to see after a particular action, and what their plans 
were for further experimentation. 

Comparison of Subjects. BW, CF, HT and OY all began the experiment at 
about the same level of knowledge, measured by pre-test scores, but after the sessions 
with Smithtown, subjects BW and CF (more successful) greatly surpassed subjects 
HT and OY (less successful) on the post-test battery. In terms of gain i e., 
post-test score minus pre-test score), BW and CF scored over one standard deviation 
above the average gain score while HT and OY scored about one standard deviation 
below it. 

Pre-test Post-test 

BW + CF 47.0 86.7 

HT + OY 47.4 63.1 

* 

The question reduces to: What did BW and CF do, in terms of the indicators, 
that HT and OY did not do? Table 4 shows standardized scores for these two pairs 
of subjects. 

The largest differences (ordered) between these two groups are for the following 
ten indicators: 22, 6, 24, 29, 9, 20, 16, 23, 28, and 13. The difference scores for all of 
these Indicators exceeds .90 standardized units. 

The first observation is that the majority oi these indicators are from the most 

30 
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cognitively complex set of behaviors delineated, i.e., those in the Thinking and 
Planning category with six of the difference scores greater than .90. Next, there are 
three main differences between the two groups in the Data Management category. 
Finally, only one significant difference score is from the Activity /Exploration 
category. The progression of behaviors across these three categories goes from simply 
being active in the environment (Activity /Exploration), to efficient (Data 
Management) to finally, effective (Thinking and Planning). 

We will now discuss each of these ten indicators in turn as far as their relation 
to individual differences in performing in this type of environment. The between 
subjects' differences will be illustrated in each of the three relevant categories with 
excerpts from their verbal protocols and student procedure graphs, developed to 
depict student solution paths. 

Thinking and Planning Discriminating Indicators 

This category represents the more complex learning indicators relating to 
experimental behaviors. First, the data show that the subcategory of effective 
generalizations was a very good discriminator between these subjects. Overall, BW 
and CF attempted to generalize findings across markets (indicators 22 and °3) to see 
if developing beliefs extended beyond the current market. This included both 
generalizing to related markets (e.g, investigating the effects of a manipulation on 
substitute or complementary goods) or testing beliefs out in unrelated markets to see 
the limits and extent of a particular concept. To illustrate, BW (more successful) was 
careful to try out • ;> developing ideas in different markets to test his hypotheses. In 
the first session, he was investigating the tea market, testing the k?ea that increasing 
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the population caused an increase in the quantity demanded (it actually shifts the 

demand curve). He increased the population and then said: 

BW: Welly the quantity demanded did go up, it was 2550 last lime, 
although I would have thought it would have gone up more, twice as many 
people drinking tea pie had doubled the population]. So, quantity 
demanded did go up. There was a bit of a shortage. Well, Vd be pretty 
sure that it [shows the relationship between population and quantity 
demanded]. . . I think it would, but since I haven't tested it out, I can't 
really say. I would change the good to take care of that problem. 

Since some of the town factors have global effects and some have limited 

effects, it is a good strategy to try out things in different markets. After looking at 

the effects of interest rates on the compact car market, then switching to the donut 

market to see if interest affected anything there, BW concluded: 

BW: OK, so I guess interest rates only influence expensive things like 
compact cars or big cars, but not donuts or hamburger buns. I bet there 
are things that influence everything, like income influences everything. 

In contrast, subjects from the less successful group never generalized a concept 
across markets (related or unrelated goods). For any given market, they would make 
a hypothesis from the current data set and presume that it held across all goods, 
without actually testing that notion out. In fact, due to the way the Hypothesis 
Menu was implemented in this version of Smithtown, it was possible to state a 
number of correct hypotheses from a single market, yet that is not good scientific 
behavior. 



The next indicator that differentiated the two groups had to do with using the 
Planning Menu to set up an experiment, specifying variables to investigate, and 
actually conducting an experiment based on those stated variable manipulations 
(indicator 20). Sternberg (1981, 1985) discusses two metacomponents, global planning 
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and local planning, isolated from a complex reasoning task. In a study of planning 
behavior in problem solving, he found that mors intelligent persons scoring high on 
reasoning tests tended to spend relatively more time than low scoring persons on 
global (higher-order) planning and relatively less time on local (lower-order) planning 
Poorer reasoners, however, seemed to emphasize local rather than global planning 
relative to the better reasoners. Similarly, Anderson (1987) investigated individual 
differences in students' solutions to Lisp programming problems and found that the 
poorer students tended to be less planful in their problem solving activities. These 
findings are similar to our study in that the individuals who do engage in planning an 
experiment are more successful (measured by our gain scores criterion) than those 
who do not. To illustrate, CF (more successful) decided to test the affects of 
Weather on the der.and for icecream (where Weather can range from 1 — cold and 
wet, to 10 — warm and dry). From the Planning Menu she chose the variables to 
investigate: price, quantity demanded, quantity supplied, surplus, shortage and 
weather. After changing the weather index from a medium, default value of 5 to 10, 
she said, "OK, then that means, I think, there should be an increased demand for 
icecream. * She collected and recorded the data, observed that, indeed, the quantity 
demanded of icecream went up, and chose the framework: Same *ood, Change 
Independent Variable so that she could stay in the icecream market and manipulate 
the weather variable further. From the new Planning Menu, she selected the same 
variables as before, then changed the weather, f /Vn gonna make the weather really 
bad. Ill put it at 1. . . I think there'll be a surplus now, at the other extreme. 9 This 
prediction was confirmed by her data. The other two subjects that were less 
successful evidenced much less front end (higher-order) planning of an experiment 
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and typically only selected a few (or irrelevant) variables from the Planning Menu. 
Often, changing an independent variable has effects on certain other variables, and 
those should be focused on in a given experiment. That is, if the population were 
increased, that could have an effect on the demand of a good, and in the long run, on 
the price of that good. 

The next discriminating indicator (indicator 29) reflects the richness and 
tenacity of an individual's actions within an experiment, as measured by the average 
number of actions taken per experimental episode. A thorough, systematic 
investigation of a concept Is indicated by more connected actions within an 
experiment while more aimless behavior is seen by fewer connected actions. If a 
person were to move around randomly in this environment, making changes, moving 
on to new things, and so on, with little or no thread of consistency, then each 
experiment would have a small number of actions taken within a given market. 
Subjects BW and CF were not random movers. Their method of investigation was to 
choose a market and do many things within that market, always observing the effects 
of their manipulations, and recording them in the on-line notebook. Thus, the 
average number of actions within their experiments was much greater than for 
subjects HT and OY. In addttion, across the three sessions, the more successful 
subjects' number of actions per experiment increased, showing that their experiments 
became more complex as they gained additional domain knowledge. The less 
successful subjects did not demonstrate a similar increase in complexity of 
experiments over time; rather, their average number of actions went up and down 
across sessions. 
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Relevant to these results, Sternberg and Davidson (1982, see also Sternberg, 
1985) looked at individual differences in the solution to insight problems where 
individuals were free to spend as long as they liked in the solution process. They 
computed a correlation of .62 between the time spent and the score on the insight 
problems; thus, persistence and involvement in the problems was highly correlated 
with success in solution. They argue that more intelligent persons do not "give up, 
nor fall for the obvious, often incorrect, solutions. 

This activity is captured in "student procedure graphs" that we constructed for 
subjects based on the idea of the problem behavior graphs of Newell and Simon 
(1972) showing student actions and the resulting state of knowledge. A state of 
knowledge is represented by a node and the application of an operator is represented 
by an arrow pointing to the right. The result of the operation Is the node at the 
head of the arrow. Vertical lines connecting nodes indicate a return to a previous 
state of knowledge because no new information was supplied. The operators and their 
symbols used for our purposes are listed below. Each operator is recorded above or 
below a horizontal arrow; an operator below the arrow indicates that the variable 
was changed back to its original default or baseline value. Most of the nodes 
(rectangles) contain symbols representing the resulting operation, also listed below. 



Operators & Variables 



Operations 



P - Price 
G - Good 



R - Notebook Recording 
S - Supply Curve 
D - Demand Curve 
/ - Superimposed curves 



H - Hypothesis 

FD - Town factor (demand shifts) 
FS - Town factor (supply shifts) 



(e.g., S/D) 



GR - Graph 
T -Table 



X - Error 
X - Error 



«'0 
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Learning Goals, or economic concepts that can be discovered are indicated by 
symbols beginning with the letter "L" followed by a number (e.g., the law of demand 
• L5). Their meaning can be seen in Figure l. Figures 13 and 14 are two examples of 
how the student procedure graphs are used to visually illustrate the flow of problem 
solving activity in more and less efficient individuals (in relation to indicator 29). 

Figures 13 and 14 exhibit obvious differences in experimental behavior using 
data from BW and another subject showing below average gain (subject SS) whose 
performance well illustrates the contrast between focused and fragmented search. 
The horizontal movement depicted in the graph of BW*s performance (see Figure 13) 
shows much more focused and connected persistent behavior than the vertical, less 
relevant movement in SS's experimental behavior, as seen in Figure 14. In Figure I3 f 
BW (more successful) began investigating the large car market by collecting data for 
the market when income was $20,000. At nodes 94 through 97, he changed the 
average income to $30,000 and collected additional data by changing price three 
times. Next, he plotted a demand curve (98) with income at $20,000 and then at 
$30,000, and at node 99, he made a hypothesis that when income increased, quantity 
demanded increased, and the demand curve would move to the right. During the 
period from nodes 100 to 102, he had the computer adjust the price back to 
equilibrium. From 103 to 106, he changed income $40,000 and again had *be 
computer adjust the price back to equilibrium. The subject said, 'It's only if you 
change something other than price that you get a new demand curve. 9 Finally, at 
node 107 he hypothesized correctly that demand curves shift as a result of changes 
other than price (i.e., one characterization or description of what causes demand 
curves to shift). 
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Contrasting with the systematic, persistent performance evidenced by BW, 
subject SS (less successful) spent a considerable amount of time generating hypotheses 
that were unrelated to the current experiment. Although both subjects were 
attempting to characterize a demand shift, Figure 14 and the following summary of 
actions clearly demonstrate an ineffective experimental procedure. 

At node 73, SS entered the market for gasoline and from nodes 74 to 75, 
changed the price from $l.l8/gallon to $l.00/gallon and then down to $0.75/gallon. 
At (76), she hypothesized that "as the price of complementary goods decrease, the 
quantity demanded increases.* This was incorrect. She then tried to graph a demand 
curve (77-78) but was unsuccessful. During the period involving the nodes 79 to 80, 
she hypothesized that as price increases, the demand curve shifts down and to the 
left. This was incorrect. The subject then entered the coffee market suddenly and 
without any apparent reason (81-82). At (83), she changed labor costs from 

$4.00/hour to $20.00/hour followed by three more incorrect hypotheses (84-88): 

• As labor costs increase, the quantity supplied decreases and shortage 
increases. 

• Quantity demanded has no relation to labor costs. 

• Quantity demanded has no relation to the price of resources. 

During (89-90) she decided to change the population from 10,000 to 50,000 and 
then returned the labor costs to $4.00/hour. Finally, at (91), she again attempted a 
hypothesis that as population increases, quantity demanded increases and quantity 
supplied increases. This, too, was not quite right. 

A major difference in experimental behavior illustrated here seems to be one of 
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staying with a problem until it is solved. Subject BW, when his initial hypothesis 
turned out to be incorrect, did more experimenting to understand more precisely the 
nature of the problem. In contrast, subject SS, who apparently was motivated by Just 
getting a hypothesis correct, tried different hypotheses, some of which were wild 
guesses as there was no relation between the stated hypotheses and the experiments 
actually conducted. 

The next indicator to discriminate between more and less effective performance 
was indicator 28: Changing only a limited number of variables per experiment where 
the fewer variables changed, the better the subsequent performance. BW and CF 
(more successful) were very conscientious in changing only one variable at a time per 
experiment. Given the freedom of the environment, it often was a great temptation 
to make changes to multiple variables concurrently, however, the ensuing results are 
obscured as far as what was actually responsible for the state of current market 
affairs. Subjects HT and OY (less successful) often fell prey to this temptation of 
making multiple changes. For example, while investigating the market for large cars 
and asked what he was going to do, OY responded, *I want to just go back and 
change some stuff. 9 He then proceeded to change interest rates from 15% to 6.7% f 
number of suppliers (i.e., large car dealerships) from 10 to 20, consumer preference 
(i.e., popularity of large cars) from 5 (medium) to 10 (very high), and then back to 5 f 
per capita income from $20,000.00 to $25,000.00, and then interest rates from 6.7% 
to 9%. This was all done at one time without collecting any data in between the 
changes. When he was asked about what he would predict would happen as a result 
of all of the changes, he said, V think they'll still buy the cars, because the income 
is higher now. . .but the interest rates are higher. . .but since they're making more 
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income, then I think they can afford it' OY's working memory capacity has 
obviously been overloaded at this point and he fails to even consider the effects of the 
number of suppliers, consumer preference, or any of the potential interactions. Upon 
inspecting the market data, he sees that, in fact, there is an overall surplus of large 
cars. This is viewed as confirmation of his prediction, but it obviously is confounded 
by the fact that he had raised the number of large car dealerships in Smithtown as 
well as the per capita income. These last two actions actually have opposing effects 
whereby increasing the number of dealers would result in a surplus of cars while 
increasing the income would cause a shortage of cars. 

The last indicator falling under the Thinking and Planning category involves 

collecting sufficient amounts of data before making a hypothesis of any of the 

economic concepts (indicator 24). Good scientific methodology involves generalizing 

a concept based on enough examples or instances of a phenomenon rather than 

inadequate data which may include elements of chance, confounding variables, or 

other things. BW (more successful) investigated the concept of "surplus" and its 

relationship to price, quantity demanded and quantity supplied. In the following 

protocol, it is apparent that his investigation looked at the concept from many 

angles, collecting mo^ than enough data before rendering a hypothesis. He had Just 

had the computer adjust the price of hamburger buns (raising* the price), 

BW: The price went up a lot, and there's a big surplus. . . . Well, as I 
found out before, as price goes up, the quantity demanded goes down, 
quantity supplied goes up. So, by now the quantity demanded has gone 
below the quantity supplied, and there's a surplus. So, the next time 
around I think the price should go back down 'cause there's a lot of 
hamburger buns around here. They'll go on sale. 

He watched the price slowly converge on equilibrium, interrupting the computer 
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adjustments with price adjustments himself until he found the equilibrium price 

where, *at $1.55 it came out right. . . no surplus and no shortage. 9 He speculated, 
BW: OK, so I found out when there's no surplus and no shortage the 
price won't change. I could phrase that into a hypothesis also. When 
there's a surplus, price decreases, and when there's a shortage, price 
increases. . . . If surplus is greater than zero, then the price decreases. 

When asked if he could characterize surplus any other way, he responded, 

BW: Well, it's just quantity supplied minus quantity demanded. I can 
state that. I've got enough examples! There's a surplus when thz quantity 
supplied is greater than the quantity demanded. 

He then used the Hypothesis Menu and formalized the above statement into a 

successful specification of "surplus. ■ Immediately afterwards, he used the same data 

and logic to characterize "shortage." 

In contrast, HT (less successful) was content to make predictions and 

hypotheses based on single events and non- replicated experiments. This was not a 

good strategy for this subject to follow since her data management skills were neither 

efficient nor consistent. Moreover, sometimes she forgot or misconstrued what the 

previous data were, not bothering to go back and retrieve the omitted data. For 

instance, after spending a long time in the final session trying to determine the 

influence of population changes on some of the dependent variables, she conducted an 

experiment which involved decreasing the population of Smithtown from 10,000 to 

5,000. At that time, she was investigating the donut market, and the experimenter 

asked what she expected to see as the result of this population decrease, 
HT: So, less people will eat [donutsf 

Experimenter: What about quantity supplied and price? 

HT: When population decreases, demand. . . quantity demanded 
decreases, and quantity supplied decreases. . . price increases. 
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The market actually depicted the price and the quantity supplied remaining the 

same while the only change was In the quantity demanded, which changed as a 

function of the demand curve shift. Next, HTs actions centered aiound price 

changes, to get an equilibrium price for the donut market In the smaller sized town. 

She did not replicate the experiment with the population change, and later, when 

attempting to articulate a hypothesis, she remembered erroneous results and showed 

little understanding of cause and effect among the variables: 

HT: OK, so, I think when population decreased, the price decreased. 
That's why there is changes h^ween quantity supplied and quantity 
demanded. 

Experimenter: What was the first thing that happened? 

HT: I thin!: quantity demanded decreased. . . and when quantity 
demanded decreased, price decreased. Quantity supplied. . , let's see, 
population decreased. . . quantity supplied decreased. 

Data Management Discriminating Indicators 

Our more successful subjects, BW and CF, generally exhibited very good data 
management sk M .ls, using their notebooks efficiently and consistently. Notebook 
entries were typically made following variable changes and variables were included in 
heir notebooks that had been specified beforehand in the Planning Menu. In 
contrast, the less successful subjects (HT and OY) never became fully automatic in 
entering data to their notebooks. They continued to forget to record important 
information throughout the three sessions and had to rely on the history window to 
re- insert forgotten data. They also excluded variables whose values were changed or 
that were listed in the Planning Menu. In addition, they continued to omit baseline 
data. This latter omission was a major problem when attempting to attribute causes 
to market conditions. 



Shute, Glaser, Raghavan 36 February 1988 

Indicators 9 and 13 have to do with the total number of notebook entries and 
the number of relevant notebook entries made, respectively. In terms of Just the 
total number of notebook entries, the more entries, the better the performance. In 
terms of the type of notebook entries, the more "relevant" notebook entries made, 
overall the better the performance, where "relevant" variables are those specified in 
the Planning Menu as the variables the subject was interested in exploring and 
collecting data on. This measure indies v es whether the individual used the notebook 
efficiently in terms of recording important information. 

To illustrate the contrast in types of data recording skills, Figures 15 and 16 
show examples of sti uents with better and v/orse recording skills. 

In Figure ,5, BW (more successful) entered the tea market and, prior to 
changing any variables, decided "to see what the initial conditions are 9 He 
followed the observation with a notebook entry of the baseline data, seen in nodes 1 
and 2. At (3), he increased the price of tea from $l.83/box to $2.50/box Ho see if 
there's a relation between price and quantity demanded and quantity supplied. 9 
This price change was also duly recorded in the notebook. Daring (4-5), E*V 
continued to investigate this relationship by decreasing the price two more times, 
following each change with a notebook entry. He then graphed a demand curve (6), 
and successfully superimposed a supply curve, saving the graph for future reference. 
This sy^ematic performance led to the correct induction of the laws of demand and 
supply (7-9): 

• As price increases, the quantity demanded decreases. 

• When price increases, quantity supplied increases. 
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Our less successful subject, HT, demonstrated inefficient data recording skills (see 
Figure 3 6). Data were rather haphazardly entered into the notebook and the subject 
did not systematically record variable changes. Figure 16 illustrates some of the 
multiple variable changes and subsequent failure to record sufficient data into the on- 
line notebook. It shows that the subject started out in the coffee market and 
changed the weather conditions from a mediocre value of 5 to a slightly less pleasant 
value of 3 to see if that would affect the demand for coffee, seen in nodes l and 2. 
She predicted that if the weather decreased (became worse), then the price of coffee 
would increase. However, since she had failed to record any baseline data for the 
coffee market, she was unable to make the appropriate comparison(s). 

She then decided to ignore the weather influences, and at (3), changed the 
population from 19,000 to 4,000 persons. She predicted that if the population 
decreased, then a surplus would result. But, as with the above situation, she had 
failed to record the baseline data for when the population was 10,000, so ^he 
reinserted the necessary data from the past experiments. Next, HT tried to graph 
some data at node 4: price by quantity demanded and price by population, but in 
each case, there was only one data point per variable, thus no line could be drawn. 
She then switched to the market for gasoline (5-6) and raised the price from 
$l.50/gallon to $4.00/gallon. Since she again had failed to record the data from the 
market when gas was $i.50/gallon, she had to reinsert this information into the 
notebook. With this additional data, she tried to graph it again and at (7), she 
successfully plotted a demand curve. At node 8, she entered the market for large 
cars. Her first action there was not to record the baseline data, but to change income 
from $20,000 to $30,000. She predicted that *if the income ina eases, people will 
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inferences necessarily are generated in the context of the relevant knowledge 
structures that experts possess. Predictions, in Srnithtown, serve as a foundation or 
stepping stones to more general, abstract principles and laws of economics. Our more 
siccessful subjects seemed to be able to work forward toward a goal (i.e, they knew 
where they were going) in contrast to our less successful subjects who often got stuck 
at the more superficial or data level of investigation. To illustrate, CF (mere 
successful) was interested in looking at the relationship between the coffee and tea 
markets, "because they are similar. * First she increased the price of coffee and 
collected data on the resulting decreased quantity demard^d and increased quantity 
supplied. Next, she chose the framework: Change good, keep independent variables 
the same, changing to the tea market. Since the price of coffee had been increased, 
more people had shifted to* drinking tea so the tea market came up with an initial 
shortage confirming her initial prediction that, *// the price of coffee increases, then 
the quantity demanded of tea will increase. * She remained in these two markets 
and went on to investigate the concept of a new equilibrium point and demand shifts. 
She cot tlnued making predictions, observing the data, then proceeded on to 
successfully articulate the rules underlying the higher level concepts. The less 
successful subjects skipped among markets, failing to make sufficient predictions in 
order to test out developing hypotheses that would have led to more economic 
concepts being ultimately discovered. 



Shute, Glaser, Raghavan 39 February 1988 

inferences necessarily are generated in the context of the relevant knowledge 
structures that experts possess. Predictions, in Srnithtown, serve as a foundation or 
stepping stones to more general, abstract principles and laws of economics. Our more 
siccessful subjects seemed to be able to work forward toward a goal (i.e, they knew 
where they were going) in contrast to our less successful subjects who often got stuck 
at the more superficial or data level of investigation. To illustrate, CF (mere 
successful) was interested in looking at the relationship between the coffee and tea 
markets, "because they are similar. * First she increased the price of coffee and 
collected data on the resulting decreased quantity demard^d and increased quantity 
supplied. Next, she chose the framework: Change good, keep independent variables 
the same, changing to the tea market. Since the price of coffee had been increased, 
more people had shifted to* drinking tea so the tea market came up with an initial 
shortage confirming her initial prediction that, *// the price of coffee increases, then 
the quantity demanded of tea will increase. * She remained in these two markets 
and went on to investigate the concept of a new equilibrium point and demand shifts. 
She cot tlnued making predictions, observing the data, then proceeded on to 
successfully articulate the rules underlying the higher level concepts. The less 
successful subjects skipped among markets, failing to make sufficient predictions in 
order to test out developing hypotheses that would have led to more economic 
concepts being ultimately discovered. 



Shute, Glaser, Raghavan 



40 



February 1988 



Activity /Exploration Discriminating Indicator 

The last category to be discussed has to do with the number of times the 

subject had the computer make a price adjustment toward equilibrium (indicator 8). 

From the beginning sessions, our better subjects immediately grasped the utility of 

letting the computer make price adjustments while both the pattern of the changes it 

made and the effects on the market condition were observed. Subject BW (more 

successful) said, after his first computer change of the price, and when asked if the 

change was in accord with his expectations, 

"We//, yes, I thought so. Quantity demanded for hamburger buns was 
very high, and there were very few hamburger buns, so, it seems that 
suppliers would be able to get more for them. So, the price went up and 
there 's still a shortage of hamburger buns. If I let the computer adjvst 
the price again, the price will probably go up again. * 

He demonstrated an understanding that when the computer changed the price, tbe 

opportunity for observing systematic changes and relationships was provided. 

Although he did not have enough data to conceptualize ■equilibrium point,* he had 

started to understand that whea shortages exist, prices go up. Our less successful 

subjects tried to use the option: Computer Adjust Price, but they did not really 

grasp its purpose. It was revealed in tne second session that subject HT (less 

successful) had no idea what was going on: 

HT: Just now I had the 'Computer Adjust Price. f 

Experimenter: Yes. Do you understand whaVs going on? 

H***: No, I have no idea about that. 

Experimenter: What happened when you chose that? How did it adjust 
the price? 

HT: So, the price now increased from $1.70 to $1.90, and the quantity 
demanded decreased, decreased just a little b : A. The quantity supplied 
increased a little bit too. No surplus, and the shortage is 6. Population 
is the same. So the price increased. . . . 

^6 
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Experimenter: . . .What would happen if you let ike computer adjust 
the price again? Would it go up, down, or stay the tame? 

FIT: I don't know much about the 'Computer Adjust Price.' It can 
increase or decrease, reduce the price. . . . 

The subject continued to have difficulty with this throughout the first two sessions 

(or 4/5 of the entire time with Smithtown), not realizing the benefits of observing the 

computer make price adjustments toward equilibrium. 

Performance differences between our two groups were probably a function of 
the interaction of all of the aforementioned performance indicators. The behaviors 
that differentiated the subjects consisted of: generalizing concepts across markets 
where the generalizations were a result of well thought out and executed plans, 
having sufficient data collected prior to the generalization, engaging in more complex 
experiments within a given market and not moving randomly among markets, (i.e., 
staying in an experiment long enough to extract valuable information), changing 
variables in a parsimonious and systematic fashion, recording important data in the 
notebook from different experiments, and generating and testine predictions that 
could lead to the induction of tonomic principles and laws. 
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General Discussion 



The comparisons between the economics classroom and the experimental group 
in terms of their pre- and post-test results suggest that learning in the exploratory 
world Is at least as effective as traditional classroom learning. In fact, when learning 
time is compared, the students interacting with Smithtown spent less than half the 
amount of time formally learning economics compared to the lengtn of time spirit by 
the students in the economics classroom. It is possible that a group receiving 
classroom instruction and the intelligent discovery world could do even better. Thir 
remains an empirical question. 

Our second, more compelling concern, was with the experimental group. In 
particular, we wanted to know how individuals learn or do not learn in this type of 
environment, and on what measures th.e better and poorer learners differ. The 
contrasting pairs of subjects we Illustrated differed mostly on measures relating to 
thinking and Planning skills (i.e., effective experimental behaviors) with fewer but 
significant differences in terms of data management skills. The behaviors that 
differentiated the subjects were the following: 

1. Generalizing concepts across markets. The better subjects would try out 
economic concepts in different markets to see if they were supported 
while the less effective subjects would not bother to extend an experiment 
across maruets. 

2. Engaging in more complex experiments within a given market and not 
moving randomly among markets. Typically, the better subjects had 
many more actions within a given experiment and investigated fewer 
markets overall compared to the less effective subjects. 

3. Changing only one variable at a time and holding all others constant. 
The biggest problem for the poorer subjects was that they persisted in 
changing multiple variables simultaneously. The better subjects changed 
fewer variables at a time, typically just single variables. 
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4. Basing generalizations on sufficient data. We set as our criteria having at 
least three related rows of notebook entries before using the hypothesis 
menu. The more successful subjects did not attempt to make general 
hypotheses prior to collecting enough data on a given concept while the 
less successful subjects were content to make careless and impulsive 
generalizations based on inadequate data. 

5. Conducting an experiment based on a planned manipulation or set of 
manipulations. The planning and inferencing abilities of the better 
subjects allowed them to set up an experiment and execute it thoroughly 
whereas advance (i.e., higher-level) planning by the less successful subjects 
was rarely evidenced throughout the experimental sessions. 

6. 'Generating and testing experimental predictions. The better subjects 
tended to be more hypothesis or rule-driven (working forward towards a 
goal) while the less efficient subjects were more data-driven in 
experimentation. When evidence does not confirm a hypothesis, further 
experimentation is required to modify the hypothesis. The better subjects 
generally recognized and implemented this approach, while others 
engaged in less systematic activities. 

7. Entering data into the on-line notebook. Better subjects had more 
notebook entries overall compared to the less effective subjects. In 
addition, those entries tended to be more consistent with, and relevant to, 
the focus of their investigation. 

8. Using the computer to make price adjustments of a good towards 
equilibrium. 



Demographic information was obtained along with the pre-test battery from all 
subjects, and two questions asked: (l) what science courses the subject had taken 
since high schoul, and (2) what their major was. Subject BW (more effective) had 
taken just two science courses (physics I and II) and he was a sophomore, majoring in 
math. Subject CF (more effective) haa taken three science courses (physics I, II, and 
III) and was also a sophomore, majoring in electrical engineering. In our less effective 
group, Subject HT had five science courses (physics, two semesters of Calculus, 
Fortr an and chemistry) and she was a freshman, majoring in pharmacy while subject 
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OY had three science courses (chemistry, biology and physics), was a sophomore, 
majoring in electrical engineering. These pairs could have differed in their scientific, 
investigative behaviors as a function of past academic courses or variables relating to 
learning style differences. Thus, according to a hypothesis that different backgrounds 
were a cause of the observed differences, we would have expected the less scientific 
subjects to have taken fewer science courses. This was not the case. In fact, the less 
successful group had an average of 4 prior science courses while our more successful 
group only had an average of 2.5 science courses since high school. In addition, each 
of the subjects was a science major. Of the original ten subjects in our experimental 
group, this same pattern was found. Divining the subjects into two groups of five 
each based on their gain score, the two groups had the same number of declared 
science majors in each (I.e., 3 per group). However, the "less successful" group had 
taken considerably more science courses since high school (total = 27) compared to 
the "more successful" group (total = 8). Thus, the idea of differential exposure to 
science training seems not to be a major factor in determining who will demonstrate 
better scientific behaviors. 

Although this study focused on contrasting subjects in a descriptive and 
exploratory sense, the question arises if the findings generalize to the population at 
large. As part of the Learning Abilities Measurement Program (LAMP) at the Air 
Force Human Resources Laboratory, the first author Is currently testing a large 
group of subjects (i.e., basic recruits at Lackland Air Force Base, Texas) with a 
modified version of the system which includes 44 performance indicators that are 
automatically tallied in real time and summarized by the computer at the end of a 
three and a half hour session. Using a measure of general intelligence as the 
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dependent variable (i.e, the AFQT: a composite score derived from the Armed 
Services Vocational Aptitude Battery), listed below are the results of a correlational 
study of the indicators with AFQT score that map onto the above descriptions of 

major individual differences (N= 527): 

1. Generalizing concepts across related or unrelated markets did not 
correlate with AFQT scores in the larger study. Reviewing the code, we 
found that our conditions for these two indicators were much too 
stringent. 

2. Engaging in more complex experiments within a given market was tallied 
by the average number of actions per experiment. This indicator had a 
significant correlation with AFQT score: r= .17; p< .001, therefore the 
more connected actions taken in an experiment was associated witb a 
higher AFQT score. Related to the nature of the experimentation, we also 
tallied the total number of markets investigated. Three regression 
analyses were run on the data: forward, backward and stepwise, with 
AFQT score as the dependent variable. In all solutions, "Number of 
Markets" was one of the five most predictive variables with an inverse 
relationship to AFQT. That is, the fewer markets investigated, the more 
predictive of higher AFQT score. 

3. The average number of independent variables changed at one time (i.e., 
per experiment) had a significant negative correlation to AFQT score (r= 
-.23; p< .001) implying that the fewer variables changed at a time, the 
better the performance. 

4. Making hypotheses based on sufficient data was estimated by the 
indicator computing if the subject had at least three rows of related 
notebook entries before using the Hypothesis Menu. This correlated with 
AFQT score in our larger sample: r=.30; p< .001, thus the better 
subjects relied on more data before formulating general principles and 
laws. 

5. When a subject specifies his/her intentions for an experiment via a' 
contrived manipulation on a variable or set of variables in the Planning 
Menu and actually conducts the experiment with those variables, this 
indicator is incremented. There was a significant correlation between 
planned performance and AFQT score: r= .16; p< .001 implying that 
the more intelligent persons tended to engage in more higher level, 
advanced planning of an experiment. 

6. Making and testing predictions of experimental outcomes and then 
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observing the results for confirmation or negation of the prediction is 
effective interrogative behavior and tallied by this indicator. In this 
larger study, the overall number of predictions that a subject made was 
correlated with AFQT score: r= .18; p< .001. 

7. The quantity and quaMty of on-line notebook entries were two significant 
indicators discriminating among subjects. First, the total number of 
entries in the notebook was significantly correlated with AFQT score: r= 
.26; p< .001, therefore the higher AFQT scores were associated with 
more notebook entries overall. Variables entered <nto the notebook that 
had been specified in the Planning Menu was the second Indicator, 
correlating n'ith AFQT score: r= .30; p< .001. This implies that higher 
AFQT scores are associated with consistent behaviors; that is, formulating 
a planned set of variables to investigate and reliably entering those 
variables into the notebook. 

8. The indicator tallying the number of times the student had the computer 
make a price adjustment was not correlated with AFQT score in our 
larger study confirming our suspicions that It is more of a cognitive style 
preference than a learning skill discriminator. 



Other indicators from the larger study that significantly correlated with AFQT 
score included the following, (a) Total number of actions taken in the experimental 
sessions. This correlated with AFQT score: r= .26; p< .001 Implying that the more 
intelligent persons were more active, overall, than the less intelligent persons. This 
must be viewed in light of the other Indicators relating to the quality of performance, 
however, as it is not a matter of simply being "busy 6 in the environment, but active 
in a connected, directed, systematic sense, (b) Total number of economic concepts 
learned. The r= .18; p< .001 therefore the higher AFQT scores were associated 
with learning more concepts in the 3.5 hour session. Finally, (c) The number of 
experimental frameworks utilized by the subjects correlated with AFQT score: r= 
.27; p< .001, so the experimental frameworks were employed more by the successful 
individuals as a planning procedure than the less successful per.ons. 
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Thus, the larger study seems to corroborate, to some extent, the findings from 
the descriptive analyses, extending and more precisely deiineating individual 
differences in learning from this type of environment. 

A limitation of the present study was the collapsing of data across sessions for 
this initial investigation. This can result in a loss of information that is valuable for 
looking at individual differences and changes in knowledge and skills over time. 
Another limitation was that the use of difference scores on the economic tests as the 
measure of success was not ideal. That is because our primary focus for Smithtown 
was on the learning of good inquiry skills, and only secondarily on the acquisition of 
economic knowledge. The ideal criterion (and data we plan to collect) should be the 
transfer of skills across dom&ins; i.e., how well students perform in a new 
environment with a similar structure/architecture but which differs in content from 
Smithtown. Currently, there are several other systems being developed that fit these 
criteria, and further studies are planned which will investigate transfer of learning of 
these inquiry skills to new domains. 

In general, it appears that in the rather complex task involved in this study, 
many of the behaviors that differentiated successful and less successful subjects are 
similar to those identified in previous studies with both laboratory and more realistic 
tasks. Individual differences in performance in our exploratory environment involved 
the following dimensions: generalization, goal setting and planning, more or less 
structured search, specific performance heuristics, and memory management. Better 
subjects tended to think in terms of generalizing their hypotheses and explorations 
beyond the specific experiment or market they were working on. They conceived of 
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a lawful regularity as a general principle and as a description of a class of events 
rather than a local description. Better reasoners were more sensitive to tie existence 
to the existence of deeper explanatory principles in addition to local data 
descriptions; they appeared to realize that discovery was not only a function of data, 
but that they needed to generate rome rule that could provide them with a goal for 
their actions. In this sense they tended to be more rule or hyoothesls-driven than the 
less successful subjects. 

Better reasoners also engaged in more connected actions— more structured 
search. They conceived of a particular market- as a rich environment in which many 
actions needed to be taken in order to develop a structured understanding; 
disconnected probes did not a c sist them in their attempt at understanding. Less 
successful subjects, on the other hand, moved more frequently between markets. 
Their behavior was more fragmented and displayed a breadth of exploration, in 
contrast to more depth-like search, in their attempt to establish meaning tn a 
particular context. 

Planning behaviors differentiated individuals where successful subjects planned 
their manipulations and experiments. Given the opportunity, they would structure a 
plan and then carry it out with specific information. The immediacy of carrying out 
some action was more salient to the less successful subjects, comparable to jumping 
to equations solving in physics problems. 

The successful individuals in our study employed more powerful heuristics 
compared to the less successful individuals. They manipulated fewer vaUables, 
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holding variables constant while one variable was systematically explored. Less 
successful subjects did not seem to realize the power of this heuiistic and for them it 
was a less salient activity. Successful subjects took their time to generate sufficient 
evidence before coming to a conclusion while the less successful subjects ^-^re more 
impulsive and attempted to induce generalizations based on inadequate information. 

The necessity to manage memory was evident in the performance of the better 
subjects. Tney realized that they needed to store md display the information they 
had collected. Their * v x management performance was goal- driven in the sense 
that the data collected were relevant • ; ,ne current focus of their investigation. This 
contrasts with the poorer subjects' data management oehaviors which were mostly 
inconsistent ani often unrelated to an overall goal in their experimentation. 

In regard * . induccive problem solving, as Greeno and Simon state and as Klahr 
and Dunba r describe the interplay between rules and instances, the best learning 
strata Js a combination of bottom-up and top-down processing. In our subjects, 
this seemed to be the case: the better subjects would predict variable relationships 
and then test those hypotheses out, concurrently exploring and collecting data which 
led to further generalizations. Our less effective subjects seemed to be limited to a 
more data-driven (or bottom-up) approach, often falling short of grasping the larger 
picture. This is in at i with findings investigating novice - expert differences in 
problem solving (e.g., LarKin, McDermott, Stmon and Simon, 1980). 

Furthermore, the importance of higher level planning in this inductive 
discovery environment is in agreement with studies of individual differences in 
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reasoning tasks (e.g., Sternberg, 1985). Better subjects consistently planned an 
experiment and then executed it to completion, according to plan, in sharp contrast 
to the more haphazard, less planful approach applied by less successful subjects In 
their experimental methodologies. 

In conclusion, we have described an initial study of individual differences in 
learning from an exploratory environment Y/here students had the opportunity to 
en&a^e In active, discovery learning of economic concepts by manipulating variables 
in a hypothetical town and observing the repercussions. Overall, the system worked 
as we had hoped: Tutoring on the scientific inquiry skills resulted in learning the 
domain knowledge as evidenced by performance on the post-test battery. 

We have begun to delineate skills and behaviors which are important to 
scientific inference and discovery learning. Although there is currently not very 
much research "^eing conducted in this area, the behaviors we have identified in this 
chapter fit with findings from related research (e.g., Klahr & Dunbar, 1987; Langley 
et al., 1987). In "addition, these specific behaviors relate to individual differences 
found in studies on problem solving and concept formation. From an instructional 
perspective, the behaviors we have identified can serve as a focal point for relevant 
intervention studies. Related and complementary projects that are planned for the 
immediate future include (L. Schaubie & K. Raghavan, personal communication, 
December 15, 1987): (a) extending the analysis of inference and discovery behavior 
across concent domains, (b) studying the influence of preconceptions and qualitative 
understanding on discovery behavior, (c) identifying intrasubject variability, (d) 
coaching on discovery behavior, (e) improving discovery behavior through training 
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and practice, and (f) implementing a generalized • discovery shell" to make the 
discovery environment portable across topics. The work In the above areas should 
yield useful educational tools and insight that can be coordinated with science 
education. 
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Graph of a topical supply curve 
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Screen display of Smithtown with notebook entries made 
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Figure 10 
Screen display of the Grapn package 
with supply and demand curves superimposed 
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Screen display of the Hypothesis Menu with the lav/ of demand stated 
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Figure 12 

Pre- and post-test scores by treatment group 
(collapsed multiple choice and short answer test data) 

72 



More successful suh ; ect 



93 

R 


FD3 
— ^ 


94 

R 


P 

—> 


95 

R 


P 

— -> 


96 

R 


P 


97 

R 


GR 


98 
D/D 

S/S 

R 


H 


99 
X 








—> 


— > 


— > 



100 



101 



R 



Equilibrium 



Equilibrium 



102 

R 


FD3 
—> 


103 

R 


P 

—> 


104 

R 


P 

— 9> 


105 

R 


P 


106 

R 


H 


107 
R L9 










— > 



P 


Price 


X 


Incorrect attempt at demand shift 


R • - 


Records data 


L9 - 


Demand Shift 


H 


Hypothesis 


D/D - 


Superimposes two demand curves 


G 


Graph 


s/s - 


Sup' r imposes two supply curves 


FD3 - 


Income 







Figure 13 

Student procedure graph of a more successful subject 

where horizontal movement of the graph indicates 
market investigation prior to the second hypothesis 



72 



FS2 



73 



74 



R 



75 



H 



76 

L9 X 





77 






79 






81 - 




H 



9> 



80 

L4 X 



G7 




FS2 



Less successful subject 



83 



H 



84 

Lll X 





85 - 


. 






H 



87 



9H 



88 

X 




— > 


90 

R 




i 

91 

L9 ' X 


FS2 


> 



Price 

Records data 
Graph 

Hypothesis 
Gas 

Coffee 



End Session 3 



FS2 

FD2 

L9 

L4 

Lll 



Labor Costs 
Population 

Incorrect atempt at demand shift 
Incorrect attempt at demand curve 
Incorrect attempt at supply shift 
.Unrelated to investigation 



Figure 14 

Student procedure graph of a less successful subject where 
vertical movement of the graph indicates a lack of experimentation 
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Figure 15 

Student procedure graph of a mora successful 
subject showing good data recording skills 
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Figure 16 

Student procedure graph of * less successful 
subject showing poor data recording skills 
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imary statistics: means, standard deviations 
PSYCHOLOGY ECONOMICS EXPERIMENTAL 



MC SA MC 

11.50 16.20 11.70 

2.88 5.77 ' 2.31 

13.70 17.90 16.00 

3.65 5.17 2.36 



SA MC SA 

15.00 12.00 14.10 

3.83 2.87 3.93 

25.70 15.20 25.30 

2.54 2.90 1.89 



e 2: Subjects' Scores on the Economic Tests 
(percent correct) 



Mean 



SJihl^L Prg-frftfft Poat-t»„t Gain 

BW S3-? 89.9 C6.2 

JS 54. 2 7 s.4 21.2 

33 S3. 3 75.4 22.1 

^ S4 -2 84.4 30.2 



HT 43.1 



JH 42.7 



OY 51.7 



Standard deviation 2j» 



56.8 13.7 



CR 77 -8 84.4 6.6 



73.9 31.2 



CP 4 °-2 83.4 43.2 



69.4 17.7 



03 4 2-? 70.4 27.7 
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I. Activity r.ovgl 

1. Total number of actions 

2. Total number of experiments 

3. Number of changes to the price of the good 

II. Exploratory ^havi ?n 

4. Number of markets investigated 

5. Number of independent variables changed 

6. Number of computer-adjusted prices 
Number of times market sales information was viewed 

8. Number of baseline data observations of market in 
equi librium 

III. Data Rppnrding 

9. Total number of notebook entries 
10. Number of baseline data entries of market in 
equilibrium 

Entry of changed independent variables 
Number of reinsertions of changed independent 
variables 



7 



. 1 
12 



IV. Efficient. T_gflJ 

13. Number of 'relevant* notebook entries divided by 
total number of notebook entries where 'relevant* 
refers to those variables specified in the Planning 
Menu. 
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Table 3, page 2 



9 

ERIC 



14. Number of times the table package was used 
•correctly divided by the total number of times the 
table was used, where 'correctly- means less than 6 
variables tabulated, and sorting was done on 
variables with differing values. 

15. Number of times the graph package was used 
•correctly' divided' by the total number of times the 
graph was rsed, where 'correctly- me ans plotting 
relevant variables, saving graphs, and superimposing 
graphs with a shared axis. 

V. Use of Eyjrien,^ 

16. Number of specific predictions made divided by the 
number of general hypotheses made. The larger this 
ratio, the more data-driven the inquiry. 

17. Number oj correct hypotheses divided by the total 
number of hypotheses made. 

VI. Consistent, Behaving 



18 
19 



20 



Number of notebook entries of Planning Menu items. 
Number of times notebook entries of Planning Menu 
items were made divided by the number of planning 
opporturities the subject had. 

Number of times variables were changed that had been 
specified beforehand in the Planning .Menu. 



VII. EUacJLUg General i^* f1 o n 



Number of times an experiment was replicated. 
Number of times a concept was generalized across 



21 
22. 

unrelated goods 

Number cf times a concept was generalized across 



23 



related goods. 
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Table 3, page 3 



24. Number of times the student had sufficient data for a 
generalization (i.e., at least 3 data points in the 
notebook before using the Hypothesis Menu) . 

Vllr . Effective Exce»»fiiiAnt, ft l Bghayi ftrff 

25. Number of times a change tc an independent variable 
was sufficiently large enough (i.e., greater than 10% 
of the possible range) . 

26. Number of times one of the experimental frames was 
selected (i.e., chose "same good', change variable,* 
'change good, same variables" or 'change good, change 
variable * ) . 

27. Number of times the Prediction Menu was used to 
specify a particular outcome to an event. 
Number of variables changed per experiment. (In the 
initial sessions, this should be a low number for 
•effectiveness,' while in the later sessions, this 
should be a higher number as the domain knowledge 
increases and the student can deal with 
interrelationships among variables.) 
Average number of actions per experiment. This 
should be an increasing function over sessions. 
Numbe- of economic concepts learned per session. 
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Table 4: Z-scores for Subjects on Each Indicator 



Indicator /Group 

I. General Activity/ 
Exploration Levels 

ACTIVITY LEVEL 
1 . 
2. 
3. 

EXPLORATORY BEHAVIORS 
4. 
5. 
6. 
7. 
8. 



BW+CF 



HT+OY Difference 



II. Data Management Skills 

DATA RECORDING 
9. 
10. 
1 1 . 
12. 

EFFICIENT TOOL USE 
13. 
14. 
15. 

USE OF EVIDENCE 
16. 
17. 



0.32 


0.73 


0.41 


-0.11 


-0 29 


v . 1 O 


0. 19 


0.55 


0 .36 


0 . 17 


-0.32 


0 .49 


0. 42 


0 13 




1 .09 
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1 . 17 
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-0 .67 


0. 10 


0.77 
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-0.06 
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1 .29 
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0.64 


0 . 27 



III. Thinking and Planning Skills 



CONSISTENT BEHAVIORS 
18. 
19. 
20. 

EFFECTIVE GENERALIZATIONS 
21 . 
22. 
23. 
24. 

EFFECTIVE EXPERIMENTAL BE} 
25. 
26. 
27. 
28 . 
29. 
30. 



0.02 


0.87 
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0.88 


0 .00 
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